From TREC to DUC to TREC Again
نویسندگان
چکیده
The Document Understanding Conference (DUC) uses TREC data as a test bed for algorithms for single and multiple document summarization. For the 2003 DUC task of choosing relevant and novel sentences, we tested a system based on a Hidden Markov Model (HMM). In this work, we use variations of this system on the tasks of the TREC Novelty Track for finding relevant and new sentences. Our complete information retrieval system couples a query handler, a document clusterer, and a summary generator with a convenient user interface. For the TREC tasks, we use only the summarization part of the system, based on an HMM, to find relevant sentences in a document and we use linear algebra techniques to determine the new sentences among these. For the tasks in the 2003 TREC Novelty Track we used a simple preprocessing of the data which consisted of term tokenization and SGML DTD processing. Details of each of these methods are presented in Section 2. The algorithms for choosing relevant sentences were tuned versions of those presented by members of our group in the past DUC evaluations (see [5, 8, 15] for more details). The enhancements to the previous system are detailed in Section 3. Several methods were explored to find a subset of the relevant sentences that had good coverage but low redundancy. In our multi-document summarization system, we used the QR algorithm on term-sentence matrices. For this work, we explored the use of the singular value decomposition as well as two variants of the QR algorithm. These methods are defined in Section 4. The evaluation of these methods is discussed in Section 5.
منابع مشابه
The Development and Evolution of TREC and DUC
The Text REtrieval Conference (TREC) has been running for 11 years now, with 93 participants in the last round of evaluation. This paper chronicles the changes in TREC over that time, emphasizing the evolution in the tasks that were evaluated rather than discussing the results of the specific evaluations. The development of the new Document Understanding Conference (DUC) is also discussed, incl...
متن کاملAn Ensemble Approach for Expanding Queries
In our TREC participation, we used an ensemble approach in query expansion. Query expansion, such as synonym expansion, had shown promising results in medical literature search. On the other hand, some of the 2011 papers reported worse results from expansion. Since there are multiple knowledge sources available and each resource has clear strengths and weaknesses, we tested the combination of t...
متن کاملBi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 Knowledge Base Acceleration Track
Same as Report (SAR) 18. NUMBER
متن کاملInvestigating and prioritizing factors affecting on Human Resource Management Professionalism in Tehran Regional Electric Co.
In today's organizations, an effective human resource management as the most important organizational source- especially in critical industries such as electricity industry- is not achievable unless having a systematic, specialized and professional perspective. Professionalism is a perfect and holistic view to a work or working situation that it encompasses a comprehensive mastery in affairs an...
متن کاملA Hidden Markov Model for the TREC Novelty Task
The algorithms for choosing relevant sentences were tuned versions of those presented in the past DUC evaluations and TREC 2003 (see [4, 5, 10] [11] for more details). The enhancements to the previous system are detailed in Section 3. Two methods were explored to find a subset of the relevant sentences that had good coverage but low redundancy. In the multi-document summarization system, the QR...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003